Efficient Mining of Repetitions in Large-Scale TV Streams with Product Quantization Hashing
نویسندگان
چکیده
Duplicates or near-duplicates mining in video sequences is of broad interest to many multimedia applications. How to design an effective and scalable system, however, is still a challenge to the community. In this paper, we present a method to detect recurrent sequences in large-scale TV streams in an unsupervised manner and with little a priori knowledge on the content. The method relies on a product k-means quantizer that efficiently produces hash keys adapted to the data distribution for frame descriptors. This hashing technique combined with a temporal consistency check allows the detection of meaningful repetitions in TV streams. When considering all frames (about 47 millions) of a 22-day long TV broadcast, our system detects all repetitions in 15 minutes, excluding the computation of the frame descriptors. Experimental results show that our approach is a promising way to deal with very large video databases.
منابع مشابه
Adaptive Quantization for Hashing: An Information-Based Approach to Learning Binary Codes
Large-scale data mining and retrieval applications have increasingly turned to compact binary data representations as a way to achieve both fast queries and efficient data storage; many algorithms have been proposed for learning effective binary encodings. Most of these algorithms focus on learning a set of projection hyperplanes for the data and simply binarizing the result from each hyperplan...
متن کاملDeep Quantization Network for Efficient Image Retrieval
Hashing has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval. Supervised hashing improves the quality of hash coding by exploiting the semantic similarity on data pairs and has received increasing attention recently. For most existing supervised hashing methods for image retrieval, an image is first represented as a vector of hand-crafted or machin...
متن کاملSimultaneous Compression and Quantization: A Joint Approach for Efficient Unsupervised Hashing
The two most important requirements for unsupervised data-dependent hashing methods are to preserve similarity in the low-dimensional feature space and to minimize the binary quantization loss. Even though there are many hashing methods that have been proposed in the literature, there is room for improvement to address both requirements simultaneously and adequately. In this paper, we propose a...
متن کاملApproximate Nearest Neighbor Search by Residual Vector Quantization
A recently proposed product quantization method is efficient for large scale approximate nearest neighbor search, however, its performance on unstructured vectors is limited. This paper introduces residual vector quantization based approaches that are appropriate for unstructured vectors. Database vectors are quantized by residual vector quantizer. The reproductions are represented by short cod...
متن کاملMin-wise independent sampling from skewed data streams
Min-wise independent hashing is a powerful sampling technique for estimating the similarity between sets. In particular, it has proved to be ubiquitous for mining data streams of large volume where the input sets are revealed in arbitrary order and the elements in a given set do not arrive consecutively. More precisely, for sets of elements E and attributes A the input is a stream of element-at...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012